Missing data imputation, classification, prediction and average treatment effect estimation via Random Recursive Partitioning

نویسندگان

  • Stefano M. Iacus
  • Giuseppe Porro
چکیده

In this paper we describe some applications of the Random Recursive Partitioning (RRP) method. This method generates a proximity matrix which can be used in non parametric hot-deck missing data imputation, classification, prediction, average treatment effect estimation and, more generally, in matching problems. RRP is a Monte Carlo procedure that randomly generates non-empty recursive partitions of the data and evaluates the proximity between observations as the empirical frequency they fall in the same cell of these random partitions over all the replications. RRP works also in the presence of missing data and is invariant under monotonic transformations of the data. No other formal properties of the method are known yet, therefore Monte Carlo experiments are provided in order to explore the performance of the method. A companion software is available in the form of a package for the R statistical environment.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Predicting Implantation Outcome of In Vitro Fertilization and Intracytoplasmic Sperm Injection Using Data Mining Techniques

Objective The main purpose of this article is to choose the best predictive model for IVF/ICSI classification and to calculate the probability of IVF/ICSI success for each couple using Artificial intelligence. Also, we aimed to find the most effective factors for prediction of ART success in infertile couples. MaterialsAndMethods In this cross-sectional study, the data of 486 patients are colle...

متن کامل

Missing data imputation in multivariable time series data

Multivariate time series data are found in a variety of fields such as bioinformatics, biology, genetics, astronomy, geography and finance. Many time series datasets contain missing data. Multivariate time series missing data imputation is a challenging topic and needs to be carefully considered before learning or predicting time series. Frequent researches have been done on the use of diffe...

متن کامل

تحلیل مشاهدات گمشده در مطالعه اثر دوزهای مختلف مکمل ویتامین D بر مقاومت به انسولین در دوران بارداری

Introduction: The aim  of  this  study  was to impute missing data  and  to compare the effect  of  different doses of  vitamin D supplementation on  insulin resistance during  pregnancy. Methods: A clinical trial  study   was done on 104  women  with diabetes and gestational age less than 12 weeks between 1391 and...

متن کامل

ارزیابی صحت پیش‌بینی ژنومی در معماری‌های مختلف ژنومی صفات کمی و آستانه‌ای با جانهی داده‌های ژنومی شبیه‌سازی‌شده، توسط روش جنگل تصادفی

Genomic selection is a promising challenge for discovering genetic variants influencing quantitative and threshold traits for improving the genetic gain and accuracy of genomic prediction in animal breeding. Since a proportion of genotypes are generally uncalled, therefore, prediction of genomic accuracy requires imputation of missing genotypes. The objectives of this study were (1) to quantify...

متن کامل

Partially linear varying coefficient models with missing at random responses

This paper considers partially linear varying coefficient models when the response variable is missing at random. The paper uses imputation techniques to develop an omnibus specification test. The test is based on a simple modification of a Cramer von Mises functional that overcomes the curse of dimensionality often associated with the standard Cramer von Mises functional. The paper also consid...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006